Goto

Collaborating Authors

 snr regime


Semantic Communication for Cooperative Perception using HARQ

arXiv.org Artificial Intelligence

Cooperative perception, offering a wider field of view than standalone perception, is becoming increasingly crucial in autonomous driving. This perception is enabled through vehicle-to-vehicle (V2V) communication, allowing connected automated vehicles (CAVs) to exchange sensor data, such as light detection and ranging (LiDAR) point clouds, thereby enhancing the collective understanding of the environment. In this paper, we leverage an importance map to distill critical semantic information, introducing a cooperative perception semantic communication framework that employs intermediate fusion. To counter the challenges posed by time-varying multipath fading, our approach incorporates the use of orthogonal frequency-division multiplexing (OFDM) along with channel estimation and equalization strategies. Furthermore, recognizing the necessity for reliable transmission, especially in the low SNR scenarios, we introduce a novel semantic error detection method that is integrated with our semantic communication framework in the spirit of hybrid automatic repeated request (HARQ). Simulation results show that our model surpasses the traditional separate source-channel coding methods in perception performance, both with and without HARQ. Additionally, in terms of throughput, our proposed HARQ schemes demonstrate superior efficiency to the conventional coding approaches. Yucheng Sheng, and Shi Jin are with the National Mobile Communications Research Laboratory, Southeast University, Nanjing 210096, China (e-mail: shengyucheng@seu.edu.cn; Le Liang is with the National Mobile Communications Research Laboratory, Southeast University, Nanjing 210096, China, and also with Purple Mountain Laboratories, Nanjing 211111, China (e-mail: lliang@seu.edu.cn). Hao Ye is with the Department of Electrical and Computer Engineering, University of California, Santa Cruz, CA 95064, USA (e-mail: yehao@ucsc.edu).


An Exponentially Increasing Step-size for Parameter Estimation in Statistical Models

arXiv.org Artificial Intelligence

Using gradient descent (GD) with fixed or decaying step-size is a standard practice in unconstrained optimization problems. However, when the loss function is only locally convex, such a step-size schedule artificially slows GD down as it cannot explore the flat curvature of the loss function. To overcome that issue, we propose to exponentially increase the step-size of the GD algorithm. Under homogeneous assumptions on the loss function, we demonstrate that the iterates of the proposed \emph{exponential step size gradient descent} (EGD) algorithm converge linearly to the optimal solution. Leveraging that optimization insight, we then consider using the EGD algorithm for solving parameter estimation under both regular and non-regular statistical models whose loss function becomes locally convex when the sample size goes to infinity. We demonstrate that the EGD iterates reach the final statistical radius within the true parameter after a logarithmic number of iterations, which is in stark contrast to a \emph{polynomial} number of iterations of the GD algorithm in non-regular statistical models. Therefore, the total computational complexity of the EGD algorithm is \emph{optimal} and exponentially cheaper than that of the GD for solving parameter estimation in non-regular statistical models while being comparable to that of the GD in regular statistical settings. To the best of our knowledge, it resolves a long-standing gap between statistical and algorithmic computational complexities of parameter estimation in non-regular statistical models. Finally, we provide targeted applications of the general theory to several classes of statistical models, including generalized linear models with polynomial link functions and location Gaussian mixture models.


Neural Decoding with Optimization of Node Activations

arXiv.org Artificial Intelligence

The problem of maximum likelihood decoding with a neural decoder for error-correcting code is considered. It is shown that the neural decoder can be improved with two novel loss terms on the node's activations. The first loss term imposes a sparse constraint on the node's activations. Whereas, the second loss term tried to mimic the node's activations from a teacher decoder which has better performance. The proposed method has the same run time complexity and model size as the neural Belief Propagation decoder, while improving the decoding performance by up to $1.1dB$ on BCH codes.


Classifying high-dimensional Gaussian mixtures: Where kernel methods fail and neural networks succeed

arXiv.org Machine Learning

A recent series of theoretical works showed that the dynamics of neural networks with a certain initialisation are well-captured by kernel methods. Concurrent empirical work demonstrated that kernel methods can come close to the performance of neural networks on some image classification tasks. These results raise the question of whether neural networks only learn successfully if kernels also learn successfully, despite neural networks being more expressive. Here, we show theoretically that two-layer neural networks (2LNN) with only a few hidden neurons can beat the performance of kernel learning on a simple Gaussian mixture classification task. We study the high-dimensional limit where the number of samples is linearly proportional to the input dimension, and show that while small 2LNN achieve near-optimal performance on this task, lazy training approaches such as random features and kernel methods do not. Our analysis is based on the derivation of a closed set of equations that track the learning dynamics of the 2LNN and thus allow to extract the asymptotic performance of the network as a function of signal-to-noise ratio and other hyperparameters. We finally illustrate how over-parametrising the neural network leads to faster convergence, but does not improve its final performance.


On the Minimax Optimality of the EM Algorithm for Learning Two-Component Mixed Linear Regression

arXiv.org Machine Learning

We study the convergence rates of the EM algorithm for learning two-component mixed linear regression under all regimes of signal-to-noise ratio (SNR). We resolve a long-standing question that many recent results have attempted to tackle: we completely characterize the convergence behavior of EM, and show that the EM algorithm achieves minimax optimal sample complexity under all SNR regimes. In particular, when the SNR is sufficiently large, the EM updates converge to the true parameter $\theta^{*}$ at the standard parametric convergence rate $\mathcal{O}((d/n)^{1/2})$ after $\mathcal{O}(\log(n/d))$ iterations. In the regime where the SNR is above $\mathcal{O}((d/n)^{1/4})$ and below some constant, the EM iterates converge to a $\mathcal{O}({\rm SNR}^{-1} (d/n)^{1/2})$ neighborhood of the true parameter, when the number of iterations is of the order $\mathcal{O}({\rm SNR}^{-2} \log(n/d))$. In the low SNR regime where the SNR is below $\mathcal{O}((d/n)^{1/4})$, we show that EM converges to a $\mathcal{O}((d/n)^{1/4})$ neighborhood of the true parameters, after $\mathcal{O}((n/d)^{1/2})$ iterations. Notably, these results are achieved under mild conditions of either random initialization or an efficiently computable local initialization. By providing tight convergence guarantees of the EM algorithm in middle-to-low SNR regimes, we fill the remaining gap in the literature, and significantly, reveal that in low SNR, EM changes rate, matching the $n^{-1/4}$ rate of the MLE, a behavior that previous work had been unable to show.


High SNR Consistent Compressive Sensing Without Signal and Noise Statistics

arXiv.org Machine Learning

Recovering the support of sparse vectors in underdetermined linear regression models, \textit{aka}, compressive sensing is important in many signal processing applications. High SNR consistency (HSC), i.e., the ability of a support recovery technique to correctly identify the support with increasing signal to noise ratio (SNR) is an increasingly popular criterion to qualify the high SNR optimality of support recovery techniques. The HSC results available in literature for support recovery techniques applicable to underdetermined linear regression models like least absolute shrinkage and selection operator (LASSO), orthogonal matching pursuit (OMP) etc. assume \textit{a priori} knowledge of noise variance or signal sparsity. However, both these parameters are unavailable in most practical applications. Further, it is extremely difficult to estimate noise variance or signal sparsity in underdetermined regression models. This limits the utility of existing HSC results. In this article, we propose two techniques, \textit{viz.}, residual ratio minimization (RRM) and residual ratio thresholding with adaptation (RRTA) to operate OMP algorithm without the \textit{a priroi} knowledge of noise variance and signal sparsity and establish their HSC analytically and numerically. To the best of our knowledge, these are the first and only noise statistics oblivious algorithms to report HSC in underdetermined regression models.


Adaptivity and Computation-Statistics Tradeoffs for Kernel and Distance based High Dimensional Two Sample Testing

arXiv.org Machine Learning

Nonparametric two sample testing is a decision theoretic problem that involves identifying differences between two random variables without making parametric assumptions about their underlying distributions. We refer to the most common settings as mean difference alternatives (MDA), for testing differences only in first moments, and general difference alternatives (GDA), which is about testing for any difference in distributions. A large number of test statistics have been proposed for both these settings. This paper connects three classes of statistics - high dimensional variants of Hotelling's t-test, statistics based on Reproducing Kernel Hilbert Spaces, and energy statistics based on pairwise distances. We ask the question: how much statistical power do popular kernel and distance based tests for GDA have when the unknown distributions differ in their means, compared to specialized tests for MDA? We formally characterize the power of popular tests for GDA like the Maximum Mean Discrepancy with the Gaussian kernel (gMMD) and bandwidth-dependent variants of the Energy Distance with the Euclidean norm (eED) in the high-dimensional MDA regime. Some practically important properties include (a) eED and gMMD have asymptotically equal power; furthermore they enjoy a free lunch because, while they are additionally consistent for GDA, they also have the same power as specialized high-dimensional t-test variants for MDA. All these tests are asymptotically optimal (including matching constants) under MDA for spherical covariances, according to simple lower bounds, (b) The power of gMMD is independent of the kernel bandwidth, as long as it is larger than the choice made by the median heuristic, (c) There is a clear and smooth computation-statistics tradeoff for linear-time, subquadratic-time and quadratic-time versions of these tests, with more computation resulting in higher power.


The Role of Principal Angles in Subspace Classification

arXiv.org Machine Learning

Abstract--Subspace models play an important role in a wide range of signal processing tasks, and this paper explores how the pairwise geometry of subspaces influences the probability of misclassification. When the mismatch between the signal and the model is vanishingly small, the probability of misclassification is determined by the product of the sines of the principal angles between subspaces. When the mismatch is more significant, the probability of misclassification is determined by the sum of the squares of the sines of the principal angles. Reliability of classification is derived in terms of the distribution of signal energy across principal vectors. Larger principal angles lead to smaller classification error, motivating a linear transform that optimizes principal angles. The transform presented here (TRAIT) preserves some specific characteristic of each individual class, and this approach is shown to be complementary to a previously developed transform (LRT) that enlarges inter-class distance while suppressing intra-class dispersion. Theoretical results are supported by demonstration of superior classification accuracy on synthetic and measured data even in the presence of significant model mismatch. IGNALS that are nominally high dimensional often exhibit a low dimensional geometric structure. For example, fixed-pose images of human faces are recorded using more than 1000 pixels, but can be represented by a 9-dimensional harmonic subspace [1]. Motion trajectories of a rigid body might be recorded by hundreds of sensors, but must intrinsically be represented by a 4-dimensional subspace [2]. There are many more examples where a low-dimensional subspace model captures intrinsic geometric structure, ranging from user ratings in a recommendation system [3] to signals emitted by multiple sources impinging at an antenna array [4]. The subspace geometry has assisted tasks of interest to both signal processing [5], [6] and machine learning communities [7], [8]. It can be used to approximate a nonlinear manifold by fitting mixture components to local patches of the manifold [5], [9], hence providing a high fidelity representation of a wide variety of signal geometries.